Penta-Training: Clustering Ensembles with Bootstrapping of Constraints

نویسندگان

  • Carlotta Domeniconi
  • Muna Al-Razgan
چکیده

In this paper we combine clustering ensembles and semisupervised clustering to address the ill-posed nature of clustering. We introduce a mechanism which leverages the ensemble framework to bootstrap informative constraints directly from the data and from the various clusterings, without intervention from the user. Our approach is well suited for problems where the information available from an external source is very limited. We demonstrate the effectiveness of our proposed technique with experiments using real datasets and other state-of-the-art semi-supervised techniques.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Model Clustering for Neural Network Ensembles

We show that large ensembles of (neural network) models, obtained e.g. in bootstrapping or sampling from (Bayesian) probability distributions, can be effectively summarized by a relatively small number of representative models. We present a method to find representative models through clustering based on the models’ outputs on a data set. We apply the method on models obtained through bootstrap...

متن کامل

Clustering ensembles of neural network models

We show that large ensembles of (neural network) models, obtained e.g. in bootstrapping or sampling from (Bayesian) probability distributions, can be effectively summarized by a relatively small number of representative models. In some cases this summary may even yield better function estimates. We present a method to find representative models through clustering based on the models' outputs on...

متن کامل

A Novel Bootstrapping Method for Positive Datasets in Cascades of Boosted Ensembles

We present a novel method for efficiently training a face detector using large positive datasets in a cascade of boosted ensembles. We extend the successful Viola-Jones [1] framework which achieved low false acceptance rates through bootstrapping negative samples with the capability to also bootstrap large positive datasets thereby capturing more in-class variation of the target object. We achi...

متن کامل

Nasullah Khalid Alham

Machine learning techniques have facilitated image retrieval by automatically classifying and annotating images with keywords. Among them Support Vector Machines (SVMs) are used extensively due to their generalization properties. However, SVM training is notably a computationally intensive process especially when the training dataset is large. In this thesis distributed computing paradigms have...

متن کامل

Robust Data Clustering

We address the problem of robust clustering by combining data partitions (forming a clustering ensemble) produced by multiple clusterings. We formulate robust clustering under an information-theoretical framework; mutual information is the underlying concept used in the definition of quantitative measures of agreement or consistency between data partitions. Robustness is assessed by variance of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008